Natural Language Processing Using Neighbour Entropy-based Segmentation
نویسندگان
چکیده
In natural language processing (NLP) of Chinese hazard text collected in the process identification, word segmentation (CWS) is first step to extracting meaningful information from such semi-structured texts. This paper proposes a new neighbor entropy-based (NES) model for CWS. The considers benefits entropies, adopting concept "neighbor" optimization research. It defined by benefit ratio segmentation, including and losses combining unit with more than other popular statistical models. experiments performed, together maximum-based algorithm, NES achieves 99.3% precision, 98.7% recall, 99.0% f-measure segmentation; these performances are higher those existing tools based on seven Results show that valid CWS, especially requirements necessitating longer-sized characters. corpus used comes Beijing Municipal Administration Work Safety, which was recorded fourth quarter 2018.
منابع مشابه
Segmentation Standard for Chinese Natural Language Processing
This paper proposes a segmentation standard for Chinese natural language processing. The standard is proposed to achieve linguistic felicity, computational feasibility, and data uniformity. Linguistic felicity is maintained by defining a segmentation unit to be equivalent to the theoretical definition of word, and by providing a set of segmentation principles that are equivalent to a functional...
متن کاملA Maximum Entropy Approach to Natural Language Processing
The concept of maximum entropy can be traced back along multiple threads to Biblical times. Only recently, however, have computers become powerful enough to permit the widescale application of this concept to real world problems in statistical estimation and pattern recognition. In this paper, we describe a method for statistical modeling based on maximum entropy. We present a maximum-likelihoo...
متن کاملStatistical Natural Language Processing Method for Variant Texts Segmentation
It is well known that some techniques have already been developed to automatically subdivide texts into multiparagraph subtopic passages, such as TextTiling methodology proposed by Hearst. However, an additional algorithm is needed to perform a similar task for parallel or variant texts, because ambiguous and complicated traces of cross citation among them might often generate some sinuous patt...
متن کاملUnsupervised Natural Language Processing Using Graph Models
In the past, NLP has always been based on the explicit or implicit use of linguistic knowledge. In classical computer linguistic applications explicit rule based approaches prevail, while machine learning algorithms use implicit knowledge for generating linguistic knowledge. The question behind this work is: how far can we go in NLP without assuming explicit or implicit linguistic knowledge? Ho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Computing and Information Technology
سال: 2022
ISSN: ['1846-3908', '1330-1136']
DOI: https://doi.org/10.20532/cit.2021.1005393